来自RGB视频的多人姿势理解包括三个复杂的任务:姿势估计,跟踪和运动预测。在这三个任务中,姿势估计和跟踪是相关的,跟踪对于运动预测至关重要。大多数现有作品要么专注于单个任务,要么采用级联方法来分别解决每个任务。在本文中,我们提出了狙击手,这是一个框架,以同时进行单个推断,同时进行多人3D姿势估计,跟踪和运动预测。具体而言,我们首先提出了一种可变形的注意机制,以从视频片段中汇总时空信息。基于这种可变形的注意力,学会了视觉变压器来编码从多框架图像中的时空特征,并解码信息性姿势功能以更新多人姿势查询。最后,对这些查询进行了回归,以预测一个正向传球中的多人姿势轨迹和未来动作。在实验中,我们显示了狙击手对三个具有挑战性的公共数据集的有效性,在该数据集中,通用模型竞争对手专门的姿势估计,跟踪和预测的最先进基线。代码可在\ href {https://github.com/jimmyzou/snipper} {https://github.com/jimmyzou/snipper}中获得。
translated by 谷歌翻译
已知人体大脑能够通过更快的内存编码和在激活的神经元上访问程序来加速反复呈现对象的视觉识别。我们首次借用并将这种能力归入语义记忆设计,即SMTM,以改善设备上的CNN推断。 SMTM采用分层内存架构来利用感兴趣对象的长尾分布,并进一步融合了几种新颖的技术来将其放入效果:(1)它将高维特征映射到低维,语义向量中,用于低 - 成本准确的缓存和查找; (2)它使用一种小型度量来确定考虑不同层的固有特征的退出时间; (3)它自适应地调整缓存大小和语义向量以适应场景动态。 SMTM在商品CNN发动机上原型设计,并在移动CPU和GPU上运行。大规模数据集和模型的广泛实验表明,SMTM可以显着加快标准方法(最多2x)和先前缓存设计(高达1.5倍)的模型推断,可接受的精度损耗。
translated by 谷歌翻译
人体步态是指不仅代表活动能力的每日运动,而且还可以用人类观察者或计算机来识别步行者。最近的研究表明,步态甚至传达了有关沃克情绪的信息。不同情绪状态中的个体可能显示出不同的步态模式。各种情绪和步态模式之间的映射为自动情绪识别提供了新的来源。与传统的情绪检测生物识别技术(例如面部表达,言语和生理参数)相比,步态是可以观察到的,更难以模仿,并且需要从该主题中进行较少的合作。这些优势使步态成为情感检测的有前途的来源。本文回顾了有关基于步态的情绪检测的当前研究,尤其是关于步态参数如何受到不同情绪状态的影响以及如何通过不同的步态模式识别情绪状态的研究。我们专注于情感识别过程中应用的详细方法和技术:数据收集,预处理和分类。最后,我们讨论了使用智能计算和大数据的最先进技术的状态来讨论高效有效的基于步态的情感识别的可能发展。
translated by 谷歌翻译
Partial differential equations (PDEs) are widely used for description of physical and engineering phenomena. Some key parameters involved in PDEs, which represents certain physical properties with important scientific interpretations, are difficult or even impossible to be measured directly. Estimation of these parameters from noisy and sparse experimental data of related physical quantities is an important task. Many methods for PDE parameter inference involve a large number of evaluations of numerical solution of PDE through algorithms such as finite element method, which can be time-consuming especially for nonlinear PDEs. In this paper, we propose a novel method for estimating unknown parameters in PDEs, called PDE-Informed Gaussian Process Inference (PIGPI). Through modeling the PDE solution as a Gaussian process (GP), we derive the manifold constraints induced by the (linear) PDE structure such that under the constraints, the GP satisfies the PDE. For nonlinear PDEs, we propose an augmentation method that transfers the nonlinear PDE into an equivalent PDE system linear in all derivatives that our PIGPI can handle. PIGPI can be applied to multi-dimensional PDE systems and PDE systems with unobserved components. The method completely bypasses the numerical solver for PDE, thus achieving drastic savings in computation time, especially for nonlinear PDEs. Moreover, the PIGPI method can give the uncertainty quantification for both the unknown parameters and the PDE solution. The proposed method is demonstrated by several application examples from different areas.
translated by 谷歌翻译
The proliferation of automatic faithfulness metrics for summarization has produced a need for benchmarks to evaluate them. While existing benchmarks measure the correlation with human judgements of faithfulness on model-generated summaries, they are insufficient for diagnosing whether metrics are: 1) consistent, i.e., decrease as errors are introduced into a summary, 2) effective on human-written texts, and 3) sensitive to different error types (as summaries can contain multiple errors). To address these needs, we present a benchmark of unfaithful minimal pairs (BUMP), a dataset of 889 human-written, minimally different summary pairs, where a single error (from an ontology of 7 types) is introduced to a summary from the CNN/DailyMail dataset to produce an unfaithful summary. We find BUMP complements existing benchmarks in a number of ways: 1) the summaries in BUMP are harder to discriminate and less probable under SOTA summarization models, 2) BUMP enables measuring the consistency of metrics, and reveals that the most discriminative metrics tend not to be the most consistent, 3) BUMP enables the measurement of metrics' performance on individual error types and highlights areas of weakness for future work.
translated by 谷歌翻译
The dominant multi-camera 3D detection paradigm is based on explicit 3D feature construction, which requires complicated indexing of local image-view features via 3D-to-2D projection. Other methods implicitly introduce geometric positional encoding and perform global attention (e.g., PETR) to build the relationship between image tokens and 3D objects. The 3D-to-2D perspective inconsistency and global attention lead to a weak correlation between foreground tokens and queries, resulting in slow convergence. We propose Focal-PETR with instance-guided supervision and spatial alignment module to adaptively focus object queries on discriminative foreground regions. Focal-PETR additionally introduces a down-sampling strategy to reduce the consumption of global attention. Due to the highly parallelized implementation and down-sampling strategy, our model, without depth supervision, achieves leading performance on the large-scale nuScenes benchmark and a superior speed of 30 FPS on a single RTX3090 GPU. Extensive experiments show that our method outperforms PETR while consuming 3x fewer training hours. The code will be made publicly available.
translated by 谷歌翻译
Neuroimaging-based prediction methods for intelligence and cognitive abilities have seen a rapid development in literature. Among different neuroimaging modalities, prediction based on functional connectivity (FC) has shown great promise. Most literature has focused on prediction using static FC, but there are limited investigations on the merits of such analysis compared to prediction based on dynamic FC or region level functional magnetic resonance imaging (fMRI) times series that encode temporal variability. To account for the temporal dynamics in fMRI data, we propose a deep neural network involving bi-directional long short-term memory (bi-LSTM) approach that also incorporates feature selection mechanism. The proposed pipeline is implemented via an efficient GPU computation framework and applied to predict intelligence scores based on region level fMRI time series as well as dynamic FC. We compare the prediction performance for different intelligence measures based on static FC, dynamic FC, and region level time series acquired from the Adolescent Brain Cognitive Development (ABCD) study involving close to 7000 individuals. Our detailed analysis illustrates that static FC consistently has inferior prediction performance compared to region level time series or dynamic FC for unimodal rest and task fMRI experiments, and in almost all cases using a combination of task and rest features. In addition, the proposed bi-LSTM pipeline based on region level time series identifies several shared and differential important brain regions across task and rest fMRI experiments that drive intelligence prediction. A test-retest analysis of the selected features shows strong reliability across cross-validation folds. Given the large sample size from ABCD study, our results provide strong evidence that superior prediction of intelligence can be achieved by accounting for temporal variations in fMRI.
translated by 谷歌翻译
Underwater automatic target recognition (UATR) has been a challenging research topic in ocean engineering. Although deep learning brings opportunities for target recognition on land and in the air, underwater target recognition techniques based on deep learning have lagged due to sensor performance and the size of trainable data. This letter proposed a framework for learning the visual representation of underwater acoustic imageries, which takes a transformer-based style transfer model as the main body. It could replace the low-level texture features of optical images with the visual features of underwater acoustic imageries while preserving their raw high-level semantic content. The proposed framework could fully use the rich optical image dataset to generate a pseudo-acoustic image dataset and use it as the initial sample to train the underwater acoustic target recognition model. The experiments select the dual-frequency identification sonar (DIDSON) as the underwater acoustic data source and also take fish, the most common marine creature, as the research subject. Experimental results show that the proposed method could generate high-quality and high-fidelity pseudo-acoustic samples, achieve the purpose of acoustic data enhancement and provide support for the underwater acoustic-optical images domain transfer research.
translated by 谷歌翻译
在不同的运动模式之间切换(例如,楼梯上升/下降,坡道上升/下降)时,动力的假肢腿必须预见用户的意图。许多数据驱动的分类技术已经证明了预测用户意图的有希望的结果,但是这些意图预测模型对新主题的表现仍然不受欢迎。在其他域(例如,图像分类)中,通过从大型数据集(即预训练的模型)中使用先前学习的功能,然后将此学模型转移到可用的新任务中,可以提高转移学习的精度。在本文中,我们开发了一个基于人类运动数据集的内部受试者(受试者)和主体间(主体独立)验证的深卷卷神经网络。然后,我们使用剩下的主题中的一小部分(10%)将转移学习应用于主题独立的模型。我们比较了这三个模型的性能。我们的结果表明,转移学习(TL)模型的表现优于主题无关(IND)模型,并且与主题依赖性(DEP)模型(DEP错误:0.74 $ \ pm $ 0.002%,IND错误:11.59 $ \ \ PM $ 0.076%,TL错误:3.57 $ \ pm $ 0.02%,有10%的数据)。此外,正如预期的那样,随着剩余主题的更多数据的可用性,转移学习精度会提高。我们还通过各种传感器配置评估了意图预测系统的性能,这些传感器配置可能会在假肢应用程序中可用。我们的结果表明,假体的大腿IMU足以预测实践中的运动意图。
translated by 谷歌翻译
基于学习的视觉探针计(VO)算法在常见的静态场景上实现了显着的性能,受益于高容量模型和大量注释的数据,但在动态,填充的环境中往往会失败。语义细分在估计摄像机动作之前主要用于丢弃动态关联,但以丢弃静态功能为代价,并且很难扩展到看不见的类别。在本文中,我们利用相机自我运动和运动分割之间的相互依赖性,并表明两者都可以在单个基于学习的框架中共同完善。特别是,我们提出了Dytanvo,这是第一个涉及动态环境的基于学习的VO方法。它需要实时两个连续的单眼帧,并以迭代方式预测相机的自我运动。我们的方法在现实世界动态环境中的最先进的VOUTESS的平均提高27.7%,甚至在动态视觉SLAM系统中进行竞争性,从而优化了后端的轨迹。在很多看不见的环境上进行的实验也证明了我们的方法的普遍性。
translated by 谷歌翻译